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Abstract 

^) ■ We consider greedy contention managers for transactional memory for AI x N excution windows of 

£^ \ transactions with M threads and N transactions per thread. Assuming that each transaction conflicts with 

at most C other transactions inside the window, a trivial greedy contention manager can schedule them 
Q \ within CN time. In this paper, we show that there are much better schedules. We present and analyze 

two new randomized greedy contention management algorithms. The first algorithm Offline-Greedy 
produces a schedule of length 0(C + N\og(MN)) with high probability, and gives competitive ratio 
0(\og(MN)) for C < N\og(MN). The offline algorithm depends on knowing the conflict graph. 
(N | The second algorithm Online-Greedy produces a schedule of length 0(C \og(MN) + N log 2 (MN)) 

with high probability which is only a 0(log(NM)) factor worse, but does not require knowledge of the 
conflict graph. We also give an adaptive version which achieves similar worst-case performance and C 
is determined on the fly under execution. Our algorithms provide new tradeoffs for greedy transaction 
scheduling that parameterize window sizes and transaction conflicts within the window. 
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1 Introduction 

Multi-core architectures present both an opportunity and challenge for multi-threaded software. The oppor- 
tunity is that threads will be available to an unprecedented degree, and the challenge is that more program- 
mers will be exposed to concurrency related synchronization problems that until now were of concern only 
to a selected few. Writing concurrent programs is difficult because of the complexity of ensuring proper syn- 
chronization. Conventional lock based synchronization suffers from well known limitations, so researchers 
considered non-blocking transactions as an alternative. Software Transactional Memory ITT31 171181 systems 
use lightweight and composable in-memory software transactions to address concurrency in multi-threaded 
systems ensuring safety all the time J5j|6l. 

A contention management strategy is responsible for the STM system as a whole to make progress. If 
transaction T discovers it is about to conflict with T', it has two choices, it can pause, giving T' a chance to 
finish, or it can proceed, forcing T' to abort. To solve this problem efficiently, T will consult the contention 
manager module which choice to make. Of particular interest are greedy contention managers where a 
transaction starts again immediately after every abort. Several (greedy) contention managers have been 
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proposed in the literature. However, most contention managers have been assessed only experimentally 
by specific benchmarks. There is a small amount of work in the literature which analyzes formally the 
performance of contention managers. The competitive ratio results are not encouraging since the bounds are 
not tight. For example with respect to the O(s) bound in [2]. when the number of resources increases, the 
performance degrades linearly. A question arises whether someone can achieve tighter bounds. A difficulty 
in obtaining tight bounds is that the algorithms studied in [0 |4l [3j [I3j O apply to the one-shot scheduling 
problem, where each thread issues a single transaction. One-shot problems can be related to graph coloring. 
It can be shown that the problem of finding the chromatic number of a graph can be reduced to finding an 
optimal schedule for a one-shot problem. Since it is known that graph coloring is a very hard problem to 
approximate, the one-shot problem is very hard to approximate too lfl4l . 

In order to obtain better formal bounds, we propose to investigate execution window of transactions 
(see the left part of Figure [B, which has the potential to overcome the limitations of coloring in certain 
circumstances. An M x N window of transactions W consists of M threads with an execution sequence 
of N different transactions per thread. Let C denote the maximum number of conflicting transactions for 
any transaction in the window (C is the maximum degree of the respective conflict graph of the window). A 
straightforward upper bound is min(CA r , MN), since CN follows from the observation that each transac- 
tion in a thread may be delayed at most C time steps by its conflicting transactions, and MN follows from 
the serialization of the transactions. If we partition the window into N one-shot transaction sets, each of size 
M, then the competitive ratio using the one-shot analysis results is O(sN). When we use the Algorithm 
RandomizedRounds [14] N times then the completion time is in the worst case 0(CN log n) (for some 
appropriate choice of n). 
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(a) Before execution 
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(b) After execution 



Figure 1 : Execution window model for transactional memory 



We have results that indicate that we can obtain better bounds under certain circumstances in the window. 
We present two randomized greedy algorithms transactions are assigned priorities values, such that for some 
random initial interval in the beginning of the window W each transaction is in low priority mode and then 
after the random period expires the transactions switch to high priority mode. In high priority mode the 
transaction can only be aborted by other high priority transactions. The random initial delays have the 
property that the conflicting transactions are shifted inside their window and their execution times may not 
coincide (see the right part of Figure [T|). The benefit is that conflicting transactions can execute at different 
time slots and potentially many conflicts are avoided. The benefits become more apparent in scenarios where 
the conflicts are more frequent inside the same column transactions and less frequent between different 
column transactions. 



2 



Contributions: We propose the contention measure C within the window to allow more precise state- 
ments about the worst-case complexity bound of any contention management algorithm. We give two 
window-based randomized greedy algorithms for the contention management in any execution window W. 
Our first Algorithm Offline-Greedy gives a schedule of length 0(C + N log(MiV)) with high probability, 
and improves on one-shot contention managers from a worst-case perspective. The algorithm is offline in 
the sense that it uses explicitly the conflict graph of the transactions to resolve the conflicts. Our second 
Algorithm Online-Greedy produces a schedule of length 0(C\og(MN) + N log 2 (MN)) with high prob- 
ability, which is only a factor of 0(log(MN)) worse in comparison to Offline-Greedy. The benefit of the 
online algorithm is that does not need to know the conflict graph of the transactions to resolve the conflicts. 
The online algorithm uses as a subroutine Algorithm RandomizedRounds lfl4ll . We also give a third al- 
gorithm Adaptive-Greedy which is the adaptive version of the previous algorithms which achieves similar 
worst-case performance and adaptively guesses the value of the contention measure C. 

The technique we use for the analysis of these algorithms is similar to the one used by Leighton et al. @ 
to analyze an online packet scheduling problem. Moreover, one advantage of our algorithms is that if the 
conflicts in the window are bounded by C < N log MN then the upper bounds we have obtained is within 
poly-logarithmic factors from optimal, since A is a lower bound for the execution time. By finding window 
sizes in the program execution where C is small compared to N our algorithm provide better bounds than 
previously known algorithms. 

We prove the existence of an algorithm based on dynamic programming to find in polynomial time the 
optimal decomposition for any arbitrary window W, into sub- windows W\, . . . , W^, such the maximum 
contention density in each is the smallest possible. The density simply measures how much larger is C with 
respect to the number of transactions per thread. By applying our greedy contention management algorithms 
in the sub-windows we can obtain schedules which are asymptotically better than executing the algorithm 
in the whole window W. 

Outline of Paper: The rest of the paper is organized as follows: In Section El we discuss the related 
work. We present the transactional memory model in Section [3] We present and formally analyze an 
offline randomized greedy algorithm in Section 0] The online version is given in Section [5] In Section 
[6l we describe the adaptive version of the aforementioned algorithms. We discuss the issues of window 
decomposition for the optimal window generation in Section [7] Section [8] concludes the paper. 

2 Related Work 

Transactional Memory (TM) has been proposed in the early nineties as an alternative implementation of 
mutual exclusion that avoids many of the drawbacks of locks (e.g., deadlock, reliance on the programmer 
to associate shared data with locks, priority inversion, and failures of threads while holding locks) (H. A 
few years later the term Software Transactional Memory (STM) was suggested by Shavit and Touitou lfT5l 
and a so called Dynamic STM (DSTM) for dynamic data structures which uses a contention manager as an 
independent module was proposed 0. DSTM is a practical obstruction-free STM system that seeks advice 
from the contention manager module to either wait or abort an transaction at the time of conflict. 

Several contention managers have been proposed in the literature. Most of them have been assessed by 
specific benchmarks only and not analytically. A comparison of contention managers based on different 
benchmarks can be found in ifTTl [T2l [TOl [T3l . They found out that the choice of the contention manager 
varies with the complexity of the considered benchmark. The more detailed analysis of the performance of 
different contention managers in complex benchmarks has recently been studied by Ansari et al. [1]. From 
all the aforementioned references, it has been turned out that the coordination cost and the overhead involved 
in contention management is very high. 
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The first formal analysis of the performance of a contention manager was given by Guerraoui et al. 
which presented the Greedy contention manager and proved that it achieves 0(s 2 ) competitive ratio in 
comparison to the optimal off-line schedulers for n concurrent transactions that share s objects. Later, 
Guerraoui et al. O studied the impact of transaction failures on contention management and proved the 
0(ks 2 ) competitive ratio when some running transaction may abort k times and then eventually commits. 
Attiya et al. improved the result of H to O(s), and the result of to O(ks), which are significant 
improvements over the competitive ratio of Greedy. The also proved the matching lower bound of r2(s) for 
the competitive ratio for deterministic work-conserving algorithms which schedule as many transactions as 
possible. 

The complexity measures provided by the aforementioned studies are not satisfying as they are based 
on number of shared resources only. One can notice that number of shared resources in total is not really 
related to the actual conflicting transactions potentially encountered by an transaction. Recently, Schneider 
and Wattenhofer lfl4l analyzed some of the issues related to the number of potential conflicts; and presented 
a deterministic algorithm CommitBounds with competitive ratio 0(s) for n concurrent transactions using 
s shared resources and a randomized algorithm RandomizedRounds with makespan 0(C log n), for the 
one-shot problem of a set of M transactions in separate threads with C conflicts (assuming unit delays 
for transactions), with high probability (proportional to 1 — n _1 ). Which means, RandomizedRounds is 
only a factor of log n from optimal, with high probability, for the case where C < M. However, if other 
transactions comes into play that are able to reduce the parallelism by a factor of k, the approximation 
of RandomizedRounds also worsens by a factor of k. While previous studies showed that contention 
managers Polka iTTTTl and SizeMatters [10] exhibits good overall performance for variety of benchmarks, 
this work showed that they may perform exponentially worse than RandomizedRounds from a worst-case 
perspective. 

3 Execution Window Model 

We consider a model that is based on a M X N execution window W consisting of a set of transactions 
W = {(Tn, • • • , Ti N ), (T 2 i, ■ ■ ■ , T 2N ), (T M1 , ■■■ , T MN )} executed by the M threads running on M 
processors Pi, - ■ ■ , Pm where each thread issues N transactions in a sequence. For the simplicity of the 
analysis we assume that a single processor runs one thread only, i.e., in total at most M threads are running 
concurrently. A thread running on processor Pj executes transactions Tn, ■ ■ ■ , Tn one after the other and 
transaction Tj is executed as soon as lifj-i) has completed or committed. 

Transactions share a set of objects ^ = {0\ , • • • , O s }. Each transaction Tj may use at most s different 
objects. Each transaction is a sequence of actions that is either a read to some shared resource Oi, a write 
to some shared resource Ok, a commit, or an abort. Concurrent write- write actions or read- write actions to 
shared objects by two or more transactions cause conflicts between transactions. Each transaction completes 
with a commit when each action performed without conflicts. If conflicts occur then a transaction either 
aborts, or it may commit and force to abort all other conflicting transactions. In a greedy schedule, if a 
transaction aborts then it immediately attempts to execute again until it commits. 

Each transaction Tj has execution time duration Tjj which is greater than 0. Here, for simplicity, we 
assume that r%j = 1, i.e., each transaction needs one time unit to execute. We also assume that the execution 
of the transactions starts at time and the execution time advances synchronously for all threads step by 
step. We also assume that all transactions inside the execution window are correct, i.e., there are no faulty 
transactions. Our results can be extended by relaxing these assumptions. 

The makespan of a schedule for a set of transactions T is defined as the duration from the start of the 
schedule, i.e., the time when some transaction Tj G T is available for scheduling, until all transactions in 
T have committed. The makespan of the transaction scheduling algorithm for the sequences of transactions 
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can be compared to the makespan of an optimal off-line scheduling algorithm, which is denoted by OPT. We 
evaluate the efficiency of our new contention management algorithms by comparing their makespan with 
the makespan of the optimal off-line scheduler. 

Definition 1 (Competitive Ratio) The competitive ratio of the combination of {A, T)for a contention man- 
agement algorithm A under a set of jobs V is defined as 

CR(A T) = makes P an ( A > r ) 
makespan(OPT,T) 

Conflict Graph: For a set of transactions V C T, we use the notion of conflict graph G = (V, E). The 
neighbors of a transaction T in the conflict graph are denoted by Nt and represent all transactions that have 
a conflict with T in G. The degree dx of T in the graph corresponds to the number of its neighbors in the 
conflict graph, i.e., dx = \Nt\. Note dx < \V\. The congestion C of the window W is the largest degree of 
the conflict graph G' = (W, E'), which consists of all the transactions in the window. 



4 Offline Algorithm 

We present Algorithm Offline-Greedy (Algorithm []} which is an offline greedy contention resolution algo- 
rithm that uses the conflict graph explicitly to resolve conflicts of transactions. First, we divide the time into 
frames of duration <I> = 0(ln(MM)). Then, each thread Pj is assigned an initial time period consisting of 
Ri frames (with total duration Ri ■ <E>), where Ri is chosen randomly, independently and uniformly, from 
the range [0, a — 1], where a = C / ln(MiV). Each transaction has two priorities: low or high associated 
with them. Transaction Ty is initially in low priority. Transaction Tjj switches to high priority (or normal 
priority) in the first time step of frame Fij = R4 + (J — 1) and remains in high priority thereafter until 
it commits. The priorities are used to resolve conflicts. A high priority transaction may only be aborted 
by another high priority transaction. A low priority transaction is always aborted if it conflicts with a high 
priority transaction. 

Let Gt denote the conflict graph of transactions at time t where each transaction corresponds to a node 
and two transactions are connected with an edge if they conflict in at least one shared resource. Note that 
the maximum degree of Gt is bounded by C for the transactions in window W. At each time step t we 
select to commit a maximal independent set of transactions in Gt- We first select a maximal independent set 
Ih of high priority transactions then remove this set and its neighbors from Gt, and then select a maximal 
independent set II of low priority transactions from the remaining conflict graph. The transactions that 
commit are Ijj U II- 

The intuition behind the algorithm is as follows: Consider a thread i and its first transaction in the win- 
dow Tn. According to the algorithm, Tn becomes high priority in the beginning of frame Fn. Because 
Ri is chosen at random among aC / ln(MN) positions it is expected that Tn will conflict with at most 
0(\n(MN)) transactions which become simultaneously high priority in the same time frame (in Fif). Since 
the duration of a time frame is <E> = 0(ln(MiV)), transaction Tn and all its high priority conflicting trans- 
actions will be able to commit by the end of time frame Yi, using the conflict resolution graph. The initial 
randomization period of Ri ■ $ frames will have the same effect to the remaining transactions of the thread 
i, which will also commit within their chosen high priority frames. 



4.1 Analysis of Offline Algorithm 

We study two classic efficiency measures for the analysis of our contention management algorithm: (a) 
the makespan, which gives the total time to complete all the MN transactions in the window; and (b) the 
response time of the system, which gives how much time a transaction takes to commit. 
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Algorithm 1: Offline-Greedy 



Input: AM X N window W of transactions with M threads each with N transactions, where C is the 

maximum number of transactions that a transaction can conflict within the window; 
Output: A greedy execution schedule for the window of transactions W; 

Divide time into time frames of duration $ = 1 + (e 2 + 2) ln(MiV); 
Each thread Pj chooses a random number Ri G [0, a — 1] for a = C j ln(MN); 
foreach time step t = 0,1,2,. ..do 
Phase 1: Priority Assignment; 
foreach transaction do 
F tj «- Ri + (j - l); 
if t < F i:i • $ then 

Priority (Tij) 4— Low; 

else 

Priority (Tij) 4— High; 

Phase 2: Conflict Resolution; 
begin 

Let G t be the conflict graph at time t; 

Compute and Gf, the subgraphs of Gt induced by high and low priority nodes, respectively; 
Compute In ^- I{Gf)> maximal independent set of nodes in graph G^ 1 ; 
Q low priority nodes adjacent to nodes in Ih ; 

Compute II = I(Gf — Q), maximal independent set of nodes in graph Gf after removing Q nodes; 
Commit Ih U II', 



According to the algorithm, when a transaction enters into the system, it will be in low priority until 
Fij starts. As soon as Fij starts, it will enter into its respective frame and begin executing in high priority. 
Let A denote the set of conflicting transactions with T^. Let A 1 C A denote the subset of conflicting 
transactions of which become high priority during frame (simultaneously with T^). 

Lemma 4.1 If \A'\ < $ — 1 then transaction will commit in frame Fij. 

Proof. Due to the use of the high priority independent sets in the conflict graph Gt, if in time t during frame 
Fij transaction Ty does not commit, then some conflicting transaction in A' must commit. Since there are 
at most $ — 1 high priority conflicting transactions, and the length of the frame Fy is at most <I>, will 
commit by the end of frame Fjj. □ 

We show next that it is unlikely that \A'\ > $ — 1. We will use the following version of the Chernoff 
bound: 

Lemma 4.2 (Chernoff bound 1) Let X± , X2 , • • • , X n be independent Poisson trials such that, for 1 < i < 
n, Pr(Xj = 1) = pri, where < pri < 1. Then, for X = Yli=l-^i> l l = = Y^l^Wiy an d any 

5 > e 2 , Pr(X > Sfi) < e"^. 

Lemma 4.3 \A'\ > $ - 1 with probability at most (l/MN) 2 . 

Proof. Let A& C A, where 1 < k < M, denote the set of transactions of thread Pk that conflict with 
transaction Tij. We partition the threads Pi, ... , Pm into 3 classes Qq, Q\, and Q3, such that: 

• Qq contains every thread P^ which either \Ak\ = 0, or \Ak\ > but the positions of the transactions 
in Ak are such that it is impossible to overlap with F^ for any random intervals Ri and R^. 
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• Qi contains every thread P k with < \Ak\ < a, and at least one of the transactions in A k is positioned 
so that it is possible to overlap with with frame Fij for some choices of the random intervals R{ and 

• Q 2 contains every thread P k with a < \A k \. Note that \Q 2 \ < C/a = ln(NM). 

Let Y k be a random binary variable, such that Y k = 1 if in thread P k any of the transactions in A k 
becomes high priority in Fij (same frame with Ty), and Yf. = otherwise. Let Y = Ylk=i ^k- Note that 
\A'\ = Y. Denote pr k = Pr(Yfe = 1). We can write Y = Z$ + Z\ + Z 2 , where Zi = ^p fc eQ f ^" fc > wri ere 
< I < 2. Clearly, Z = 0. and Z 2 < \Q 2 \ < ln(MN). 

Recall that for each thread P k there is a random initial interval with frames, where R k is chosen 
uniformly at random in [0, a — 1]. Therefore, for each P k £ Q\, < pr k < \A k \/a < 1, since there are 
\Ak\ < a conflicting transactions in A.- t and there are at least a random choices for the relative position of 
transaction Tij. Consequently, 

f i = E[Z 1 ]= V pr k < T H*l = i. \A k \<-<ln(MN). 

By applying the Chernoff bound of Lemma |4~2l we obtain that 

Pr(Zi > (e 2 + 1) M ) < e-^+iV < e -2in(M7V) = (MiV) -2_ 

Since Y = Z + Z 1 + Z 2 , and Z 2 < ln(M JV), we obtain Pr(|^'| = Y > (e 2 + 2)/x = $ - 1) < (MiV)- 2 , 
as needed. □ 

Theorem 4.4 (makespan of Offline-Greedy) Algorithm Offline-Greedy produces a schedule of length 
0(C + N\og{MN)) with probability at least 1 - jj^. 

Proof. From Lemmas l4~T1 and l4~3l the frame length <3? does not suffice to commit transaction Tij within frame 
(bad event) with probability at most NM~ 2 . Considering all the MN transactions in the window a bad 
event occurs with probability at most MN ■ MiV~ 2 = MN~ X . Thus, with probability at least 1 — MiV -1 
all transactions will commit with the frames that they become high priority. The total time used by any 
thread is bounded by (a + N) ■ $ = 0(C + N log (MAT)). □ 

Since N is a lower bound for the makespan, Theorem |44] implies the following competitive ratio for the 
M x N window W: 

Corollary 1 (competitive ratio of Offline-Greedy) When C < N-ln(MN), Ci?(Offline-Greedy, W) = 

0(log(iVM)), with high probability. 

The following corollary follows immediately from Lemmas [4. 1 1 and 1431 

Corollary 2 (response time of Offline-Greedy) The time that a transaction needs to commit from the 
moment it starts is 0(C + j ■ log(MN)) with probability at least 1 — jjjjjjz- 

5 Online Algorithm 

We present Algorithm Online-Greedy (Algorithm |2]), which is online in the sense that it does not depend 
on knowing the dependency graph to resolve conflicts. This algorithm is similar to Algorithm Q] with the 



7 



Algorithm 2: Online-Greedy 

Input: AM X N window W of transactions with M threads each with N transactions, where C is the 

maximum number of transactions that a transaction can conflict within the window; 
Output: A greedy execution schedule for the window of transactions W; 

Divide time into time frames of duration $' = 16e<l> \n(MN); 
Associate pair of priorities (tt^ , irQ ') to each transaction X^; 
Each thread Pi chooses a random number Ri G [0, a — 1] for a = C/\n(NM); 
foreach time step f = 0,l,2,...do 
Phase 1: Priority Assignment; 
foreach transaction Tij do 
Fa ^R t + (j - 1); 
if t < F %3 $' then 

Priority it^ <— 1 (Low); 

else 

Priority tt^ 2) <- (High); 

Phase 2: Conflict Resolution; 
begin 

(2) 

if 7r 2J - == (Tij has high priority) then 
On (re)start of transaction T^ ; 
begin 

it^ random integer in [1, M]; 

On conflict of transaction Tij with high priority transaction T^u 
begin 

then 

Abort(Tij,T kl ); 

else 

L Abort(T ku T t] ); 



difference that in the conflict resolution phase we use as a subroutine a variation of Algorithm Randomize- 
dRounds proposed by Schneider and Wattenhofer lfl4l . The makespan of the online algorithm is slightly 
worse than the offline algorithm, since the duration of the phase is now $' = O (log 2 (MAO). 

There are two different priorities associated with each transaction under this algorithm. The pair of 
priorities for a transaction is given as a vector (tt^, tt^), where tt^ represents the Boolean priority value 
low or high (with respective values 1 and 0) as described in Algorithm [Q and vr^ 1 ) € [1, M] represents the 
random priorities used in Algorithm RandomizedRounds. The conflicts are resolved in lexicographical 
order based on the priority vectors, so that vectors with lower lexicographic order have higher priority. 

When a transaction T enters the system, it starts to execute immediately in low priority (tt^ = 1) 
until the respective randomly chosen time frame F starts where it switches to high priority (tt^ = 0). 
Once in high priority, the field will be used to resolve conflicts with other high priority transactions. A 
transaction chooses a discrete number uniformly at random in the interval [1, M\ on start of the frame 
Fij, and after every abort. In case of a conflict with another high priority transaction K but which has 
higher random number (vr^ 1 )) than T, then T proceeds and K aborts. The procedure Abort(T, K) aborts 
transaction K and K must hold off on restarting (i.e. hold off attempting to commit) until T has been 
committed or aborted. 
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5.1 Analysis of Online Algorithm 

In the analysis given below, we study the makespan and the response time of Algorithm Online-Greedy. 
The analysis is based on the following adaptation of the response time analysis of a one-shot transaction 
problem with Algorithm RandomizedRounds OS- It uses the following Chernoff bound: 

Lemma 5.1 (Chernoff bound 2) Let X\ , X2, • • • , X n be independent Poisson trials such that, for 1 < i < 
n, ~Pr(Xi = 1) = pn, where < pri < 1. Then, for X = Y2i=l f 1 = = ^LA=\V r i> and any 

< S < 1, Pr(X < (1 - S)n) < e~ 52 ^ 2 . 

Lemma 5.2 (Adaptation from Schneider and Wattenhofer ^14]l ) Given a one-shot transaction schedul- 
ing problem with M transactions, the time span a transaction T needs from its first start until commit is 

16e(dT + 1) log n with probability at least 1 \, where dx is the number of transactions conflicting with 

T. 

Proof. Consider the conflict graph G. Let denote the set of conflicting transactions for T (these are the 
neighbors of T in G). We have dr = \Nt\ < m. Let yx denote the random priority number choice of T in 
range [1, M]. The probability that for transaction T no transaction K E has the same random number 
is: 



Theorem 5.3 (makesspan of Online-Greedy) Algorithm Online-Greedy produces a schedule of length 
0(C\og{MN) + N\og 2 (MN)) with probability at least 1 - 

Proof. According to the algorithm, a transaction becomes high priority (ir^ = 0) in frame Fij. When 
this occurs the transaction will start to compete with other transactions which became high priority during 
the same frame. Lemma |4~T1 from the analysis of Algorithm [H implies that the effective degree of Ty with 
respect to high priority transactions is > $ — 1 with probability at most (A/A) -2 (we call this bad event 
A). From Lemma [5721 if dr < $ — 1, the transaction will not commit within 16e(dT + 1) log n < <£' time 
slots with probability at most (MN)~ 2 (we call this bad event B). Therefore, the bad event that Tij does not 
commit in occurs when either bad event A or bad event B occurs, which happens with probability at most 
(MN)~ 2 + (MN)~ 2 = 2{MN)~ 2 . Considering now all the MN transactions, the probability of failure 
is at most 2/MN. Thus, with probability at least 1 — 2/NM, every transaction Ty commits during the F^ 
frame. The total duration of the schedule is bounded by (a + N)& = 0(C\og{MN) + N\og 2 (MN)). □ 

Corollary 3 (competitive ratio of Online-Greedy) When C < N ■In(MN), (^^(Online-Greedy, W) = 

O (log 2 (NM)), with high probability. 

Corollary 4 (response time of Online-Greedy) The time that a transaction needs to commit from the 
moment it starts is 0(Clog(MN) + j ■ log 2 (MN)) with probability at least 1 — nnfj^p ■ 
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Algorithm 3: Adaptive-Greedy 



Input: An M x N execution window W with M threads each with N transactions, where C is unknown; 
Output: A greedy execution schedule for the window of transactions; 

Associate triplet of priorities (tt^ , tt^ , tt^- 1 '} to each transaction when available for execution; 

Code for thread Pi, 

begin 

Initial contention estimate C, <— 1; 
repeat 

Online-Greedy^,, W); 

if bad event then 

L Q <- 2 ■ d ; 

until all transactions are committed; 



6 Adaptive Algorithm 

A limitation of Algorithms Q] and |2] is that C needs to be known ahead for each window W that the algo- 
rithms are applied to. We show here that it is possible to guess the value C in a window W . We present the 
Algorithm Adaptive-Greedy (Algorithm [3]) which can guess the value of C. From the analysis of Algo- 
rithms Q] and |2j we know that the knowledge of the value C plays vital role in the probability of success of 
the algorithms. 

In Adaptive-Greedy each thread Pj attempts to guess individually the right value of C. The algorithm 
works based on the exponential back-off strategy used by many contention managers developed in the lit- 
erature such as Polka. The algorithm works as follows: each thread starts with assuming C = 1. Based 
on the current estimate C then the thread attempts to execute Algorithm |2j for each of its transactions as- 
suming the window size M x N. Now, if the choice of C is correct then each transactions of the thread in 
the window W of the thread Pj should commit within the designated frame that it becomes high priority. 
Thus, all transactions of the frame should commit within the makespan time estimate Algorithm |2] which 
is tc = 0{C\og(MN) + AT log 2 (MAT)). However, if during tq some thread does not commit within 
its designated frame (bad event), then thread Pj will assume that the choice of C was incorrect, and will 
start over again with the remaining transactions assuming C = 2C', where C is the previous estimate for 
C. Eventually thread Pj will guess the correct value of C for the window W , and all its transactions will 
commit within the respective time. 

The different threads adapt independently from each other to the correct value of C. At the same moment 
of time the various threads may have assumed different values of C. The threads with higher estimate of 
C will be given higher priority in conflicts, since threads with lower C most likely have guessed the wrong 
C and are still adapting. In order to handle conflicts each transaction uses a vector of priorities with three 
values (7r( 3 ) , ir^ , ir^). The value of priority entry 7r 3 is inversely proportional to the current guess of C 
for the thread, so that higher value of C implies higher priority. The last two entries tt^ and are the 
same as in Algorithm [2] It is easy to that the correct choice of C will be reached by a thread Pj within log C 
iterations. The total makespan and response time is asymptotically the same as with Algorithm [2] 

7 Optimal Window Decomposition 

In this section we are interested in partitioning a M x N window W into some decomposition of sub- 
windows such that if we schedule the transactions of each sub-window separately using one of our greedy 
contention managers then the sum of the makespans of the sub-windows is better than scheduling all the 
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transactions of W as a single window. In particular we are seeking a decomposition that minimizes the 
maximum density of the sub-windows, where the density expresses how much larger is the contention with 
respect to the number of transactions per thread. 

For window W with congestion C we define the density as r = C/N. Consider some decomposition 
D of window W into different sub-windows D = {W\, • • • , Wk}, where sub-window Wi has respective 
size M x Xi. Let Cj denote the contention of window Wi. The density of Wi is = Cj/Xj. Let ro = 
max^gD rj. The optimal window decomposition D* has density r^* = minD&v r D, where T> denotes the 
set all possible decompositions of W. Note that different decompositions in V may have different number 
of windows. Two example decompositions members of V is one that consists only of W, and another that 
consists of all single column windows of W. 

The optimal window decomposition D* can provide asymptotically better makespan for W if ro* = 
o(r). Using one of our greedy algorithms, the makespan of each sub-window Wi £ D* is 0((1 + ro*)Xi) 
(where the notation O hides polylog factors). Thus, using D* , the makespan for the whole window W 
becomes 0((1 + ro*) X)w-ez>* -^t) = + r£>*)N). If we apply one of our greedy algorithms in the 
whole window W directly, then the makespan for W is 0((1 + r)N), which may be asymptotically worse 
than using the optimal decomposition D* when r^* = o(r). 
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Figure 2: Optimal window decomposition 



We use a dynamic programming approach to compute the optimal decomposition D* of W. The idea 
is compute the optimal decomposition of all prefix windows of W. As shown in Figure |2j our goal is to 
determine the optimal window decomposition including the prefix window up to column k provided that 
optimal window decomposition till column k — 1 has been already computed. In this case, there are k 
possible combinations to examine for finding the optimal window size which will minimize the maximum 
of all the contention densities. The details are in the proof of the following theorem. 

Theorem 7.1 (optimal window decomposition) The optimal window decomposition D* for an arbitrary 
M x N window W can be computed in polynomial time. 

Proof. From the problem description, we can readily see the overlapping-subproblems property in the 
optimal window decomposition problem. Let rj k denote the density in the decomposition of the sub- 
window Wj : k, which starts at column j and ends at column k, where j < k. Let r* k denote the maximum 
density in the optimal decomposition of the sub-window Wj k- The optimal window decomposition in this 
scenario can be determined from this recursive formula: 

r j,k = , mjn ,{max(r*,j, (r jjk ))}. 
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To find the optimal window decomposition for the k-th prefix window Wi j, we have to check for all the 
combinations from first to k — 1 prefix window and the suffix up to k. Using the formula we can compute 
r* k for each prefix W\^- Our algorithm needs 0(k) time to compute optimal window size for the k-th 
prefix provided that the optimal window computation till the (k — l)-th prefix is known. To compute then 
all the values for each window combination from 1 to k, our algorithm recursively takes 0(k 2 ) steps. The 
final density is rrj* = r\ N . □ 

8 Conclusions 

In this paper, we consider greedy contention managers for transactional memory for M x N windows of 
transactions with M threads and N transactions per thread and present three new algorithms for contention 
management in transactional memory from a worst-case perspective. These algorithms are efficient, adap- 
tive, and handle windows of transactions and improve on the worst-case performance of previous results. 
These are the first such results for the execution of sequences of transactions instead of the one-shot prob- 
lem used in other literature, our algorithms present new trade-offs in the analysis of greedy contention 
managers for transactional memory. We also show that the optimal window decomposition can be deter- 
mined using dynamic programming for any arbitrary window. With this work, we are left with some issues 
for future work. One may consider arbitrary time durations for the transactions to execute instead of the 
0(1) time we considered in our analysis. We believe that our results scale by a factor proportional to the 
longest transaction duration. The other aspects may be to explore in deep the alternative algorithms where 
the randomization does not occur at the beginning of each window but rather during the executions of the 
algorithm by inserting random periods of low priority between the transactions in each thread. One may 
also consider the dynamic expansion and contraction of the execution window to preserve the congestion 
measure C. Thus, the execution window will not be a part of the algorithm but only a part of the analysis. 
This will result to more practical algorithms which at the same time achieve good performance guarantees. 
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Abstract 

We consider greedy contention managers for transactional memory for M x N windows of trans- 
actions with M threads and N transactions per thread. Assuming that each transaction conflicts with 
at most C other transactions inside the window, a trivial greedy contention manager can schedule them 
within CN time. In this paper, we show that there are much better schedules. We present and ana- 
lyze three new randomized greedy contention management algorithms. In the analysis of these algo- 
rithms, we introduce a new complexity measure that depends on the number of actual conflicts only 
to allow more precise statements about the worst-case complexity bound of any contention manage- 
ment algorithm. The first algorithm OFFLINE-RandomizedGreedy produces a near-optimal sched- 
ule of length 0(C + N\og(NM)) with high probability and gives competitive ratio 0(log(iVM)) 
for C < N. The second algorithm ONLINE-RandomizedGreedy produces a schedule of length 
0(C\og(NM) + N\og 2 (NM)) with high probability which is only a 0(log(NM)) factor worse in 
comparison to OFFLINE-RandomizedGreedy. Our third algorithm AdaptiveGreedy is an adaptive 
version of first and second algorithms which achieves similar worst-case performance despite the un- 
known value of C which is determined on the fly under execution starting from the initial guess value of 
C. Our results provide new tradeoffs for greedy transaction scheduling that parameterizes window sizes 
and transaction conflicts within the window. 

Keywords: transactional memory, contention managers, greedy scheduling, execution window. 



1 Introduction 



Multi-core architectures present both an opportunity and challenge for multi-threaded software. The oppor- 
tunity is that threads will be available to an unprecedented degree, and the challenge is that more program- 
mers will be exposed to concurrency related synchronization problems that until now were of concern only 
to a selected few. Writing concurrent programs is difficult because of the complexity of ensuring proper syn- 
chronization. Conventional lock based synchronization suffers from well known limitations, so researchers 
considered non-blocking transactions as an alternative. Software Transactional Memory [?, ?, ?] systems 
use lightweight and composable in-memory software transactions to address concurrency in multi-threaded 
systems ensuring safety all the time [?, ?]. 



*G. Sharma and B. Estrade are recommended for the best student paper award. 



A contention management strategy is responsible for the STM system as a whole to make progress. If 
transaction T discovers it is about to conflict with T', it has two choices, it can pause, giving T' a chance to 
finish, or it can proceed, forcing T' to abort. To solve this problem efficiently, T will consult the contention 
manager module which choice to make. Of particular interest are greedy contention managers where a 
transaction starts again immediately after every abort. Several (greedy) contention managers have been 
proposed in the literature. However, most contention managers have been assessed only experimentally 
by specific benchmarks. There is a small amount of work in the literature which analyzes formally the 
performance of contention managers. The competitive ratio results are not encouraging since the bounds are 
not tight. For example with respect to the O(s) bound in [?], when the number of resources increases, the 
performance degrades linearly. A question arises whether someone can achieve tighter bounds. A difficulty 
in obtaining tight bounds is that the algorithms studied in [?, ?, ?, ?, ?] apply to the one-shot scheduling 
problem, where each thread issues a single transaction. One-shot problems can be related to graph coloring. 
It can be shown that the problem of finding the chromatic number of a graph can be reduced to finding an 
optimal schedule for a one-shot problem. Since it is known that graph coloring is a very hard problem to 
approximate, the one-shot problem is very hard to approximate too [?]. 

In order to obtain better formal bounds, we propose to investigate execution window of transactions 
(see the left part of Figure [TJ, which has the potential to overcome the limitations of coloring in certain 
circumstances. An M x N window of transactions W consists of M threads with an execution sequence 
of iV different transactions per thread. Let C denote the maximum number of conflicting transactions for 
any transaction in the window (C is the maximum degree of the respective conflict graph of the window). A 
straightforward upper bound is mm(CN, MN), since CN follows from the observation that each transac- 
tion in a thread may be delayed at most C time steps by its conflicting transactions, and MN follows from 
the serialization of the transactions. If we partition the window into N one-shot transaction sets, each of size 
M, then the competitive ratio using the one-shot analysis results is O(sN). When we use the Algorithm 
RandomizedRounds [?] N times then the completion time is in the worst case 0(CN log n) (for some 
appropriate choice of n). 
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Figure 1 : Execution window model for transactional memory 



We have results that indicate that we can obtain better bounds under certain circumstances in the window. 
We present two randomized greedy algorithms transactions are assigned priorities values, such that for some 
random initial interval in the beginning of the window W each transaction is in low priority mode and then 
after the random period expires the transactions switch to high priority mode. In high priority mode the 
transaction can only be aborted by other high priority transactions. The random initial delays have the 
property that the conflicting transactions are shifted inside their window and their execution times may not 
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coincide (see the right part of Figure [T|). The benefit is that conflicting transactions can execute at different 
time slots and potentially many conflicts are avoided. The benefits become more apparent in scenarios where 
the conflicts are more frequent inside the same column transactions and less frequent between different 
column transactions. 

Contributions: We propose the contention measure C within the window to allow more precise state- 
ments about the worst-case complexity bound of any contention management algorithm. We give two 
window-based randomized greedy algorithms for the contention management in any execution window W. 
Our first Algorithm Offline-Greedy gives a schedule of length 0(C + iV log(MiV)) with high probability, 
and improves on one-shot contention managers from a worst-case perspective. The algorithm is offline in 
the sense that it uses explicitly the conflict graph of the transactions to resolve the conflicts. Our second 
Algorithm Online-Greedy produces a schedule of length 0(C\og(MN) + N log 2 (MN)) with high prob- 
ability, which is only a factor of 0(log(MiV)) worse in comparison to Offline-Greedy. The benefit of the 
online algorithm is that does not need to know the conflict graph of the transactions to resolve the conflicts. 
The online algorithm uses as a subroutine Algorithm RandomizedRounds [?]. We also give a third al- 
gorithm Adaptive-Greedy which is the adaptive version of the previous algorithms which achieves similar 
worst-case performance and adaptively guesses the value of the contention measure C. 

The technique we use for the analysis of these algorithms is similar to the one used by Leighton et al. [?] 
to analyze an online packet scheduling problem. Moreover, one advantage of our algorithms is that if the 
conflicts in the window are bounded by C < N log MN then the upper bounds we have obtained is within 
poly-logarithmic factors from optimal, since N is a lower bound for the execution time. By finding window 
sizes in the program execution where C is small compared to N our algorithm provide better bounds than 
previously known algorithms. 

We prove the existence of an algorithm based on dynamic programming to find in polynomial time the 
optimal decomposition for any arbitrary window W, into sub- windows Wi, ■ ■ ■ , Wf., such the maximum 
contention density in each is the smallest possible. The density simply measures how much larger is C with 
respect to the number of transactions per thread. By applying our greedy contention management algorithms 
in the sub-windows we can obtain schedules which are asymptotically better than executing the algorithm 
in the whole window W. 

Outline of Paper: The rest of the paper is organized as follows: In Section |2l we discuss the related 
work. We present the transactional memory model in Section [3] We present and formally analyze an 
offline randomized greedy algorithm in Section [4] The online version is given in Section [5] In Section 
[6l we describe the adaptive version of the aforementioned algorithms. We discuss the issues of window 
decomposition for the optimal window generation in Section |7] Section [8] concludes the paper. 

2 Related Work 

Transactional Memory (TM) has been proposed in the early nineties as an alternative implementation of 
mutual exclusion that avoids many of the drawbacks of locks (e.g., deadlock, reliance on the programmer 
to associate shared data with locks, priority inversion, and failures of threads while holding locks) [?]. A 
few years later the term Software Transactional Memory (STM) was suggested by Shavit and Touitou [?] 
and a so called Dynamic STM (DSTM) for dynamic data structures which uses a contention manager as an 
independent module was proposed [?]. DSTM is a practical obstruction-free STM system that seeks advice 
from the contention manager module to either wait or abort an transaction at the time of conflict. 

Several contention managers have been proposed in the literature. Most of them have been assessed 
by specific benchmarks only and not analytically. A comparison of contention managers based on different 
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benchmarks can be found in [?, ?, ?, ?]. They found out that the choice of the contention manager varies with 
the complexity of the considered benchmark. The more detailed analysis of the performance of different 
contention managers in complex benchmarks has recently been studied by Ansari et al. [?]. From all the 
aforementioned references, it has been turned out that the coordination cost and the overhead involved in 
contention management is very high. 

The first formal analysis of the performance of a contention manager was given by Guerraoui et al. [?] 
which presented the Greedy contention manager and proved that it achieves 0(s 2 ) competitive ratio in 
comparison to the optimal off-line schedulers for n concurrent transactions that share s objects. Later, 
Guerraoui et al. [?] studied the impact of transaction failures on contention management and proved the 
0{ks 2 ) competitive ratio when some running transaction may abort k times and then eventually commits. 
Attiya et al. [?] improved the result of [?] to O(s), and the result of [?] to O(ks), which are significant 
improvements over the competitive ratio of Greedy. The also proved the matching lower bound of fJ(s) for 
the competitive ratio for deterministic work-conserving algorithms which schedule as many transactions as 
possible. 

The complexity measures provided by the aforementioned studies are not satisfying as they are based 
on number of shared resources only. One can notice that number of shared resources in total is not really 
related to the actual conflicting transactions potentially encountered by an transaction. Recently, Schneider 
and Wattenhofer [?] analyzed some of the issues related to the number of potential conflicts; and presented 
a deterministic algorithm CommitBounds with competitive ratio Q(s) for n concurrent transactions using 
s shared resources and a randomized algorithm RandomizedRounds with makespan 0(C log n), for the 
one-shot problem of a set of M transactions in separate threads with C conflicts (assuming unit delays 
for transactions), with high probability (proportional to 1 — n -1 ). Which means, RandomizedRounds is 
only a factor of log n from optimal, with high probability, for the case where C < M. However, if other 
transactions comes into play that are able to reduce the parallelism by a factor of k, the approximation 
of RandomizedRounds also worsens by a factor of k. While previous studies showed that contention 
managers Polka [?] and SizeMatters [?] exhibits good overall performance for variety of benchmarks, 
this work showed that they may perform exponentially worse than RandomizedRounds from a worst-case 
perspective. 

3 Execution Window Model 

We consider a model that is based on a M x N execution window W consisting of a set of transactions 
W = {(Tn, • • • , Ti N ), (T 2 i, ■ ■ ■ , T 2N ), (T M1 , ■■■ , T MN )} executed by the M threads running on M 
processors Pi, - ■ ■ , Pm where each thread issues N transactions in a sequence. For the simplicity of the 
analysis we assume that a single processor runs one thread only, i.e., in total at most M threads are running 
concurrently. A thread running on processor Pj executes transactions Tn, ■ ■ ■ , Tn one after the other and 
transaction is executed as soon as T^j_i^ has completed or committed. 

Transactions share a set of objects ^ = {0\ , • • • , O s }. Each transaction Ty may use at most s different 
objects. Each transaction is a sequence of actions that is either a read to some shared resource Oi, a write 
to some shared resource Ok, a commit, or an abort. Concurrent write-write actions or read-write actions to 
shared objects by two or more transactions cause conflicts between transactions. Each transaction completes 
with a commit when each action performed without conflicts. If conflicts occur then a transaction either 
aborts, or it may commit and force to abort all other conflicting transactions. In a greedy schedule, if a 
transaction aborts then it immediately attempts to execute again until it commits. 

Each transaction has execution time duration which is greater than 0. Here, for simplicity, we 
assume that m = 1, i.e., each transaction needs one time unit to execute. We also assume that the execution 
of the transactions starts at time and the execution time advances synchronously for all threads step by 
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step. We also assume that all transactions inside the execution window are correct, i.e., there are no faulty 
transactions. Our results can be extended by relaxing these assumptions. 

The makespan of a schedule for a set of transactions T is defined as the duration from the start of the 
schedule, i.e., the time when some transaction Tij G T is available for scheduling, until all transactions in 
T have committed. The makespan of the transaction scheduling algorithm for the sequences of transactions 
can be compared to the makespan of an optimal off-line scheduling algorithm, which is denoted by OPT. We 
evaluate the efficiency of our new contention management algorithms by comparing their makespan with 
the makespan of the optimal off-line scheduler. 

Definition 1 (Competitive Ratio) The competitive ratio of the combination of {A, T)for a contention man- 
agement algorithm A under a set of jobs V is defined as 

CR(A V) = makes P an ( A > r ) 
makespan(OPT , T) 

Conflict Graph: For a set of transactions V C T, we use the notion of conflict graph G = (V, E). The 
neighbors of a transaction T in the conflict graph are denoted by Nt and represent all transactions that have 
a conflict with T in G. The degree dx of T in the graph corresponds to the number of its neighbors in the 
conflict graph, i.e., dr = \Nt\- Note dr < \V\. The congestion C of the window W is the largest degree of 
the conflict graph G' = (W, E'), which consists of all the transactions in the window. 



4 Offline Algorithm 

We present Algorithm Offline-Greedy (Algorithm [J) which is an offline greedy contention resolution algo- 
rithm that uses the conflict graph explicitly to resolve conflicts of transactions. First, we divide the time into 
frames of duration $ = 0(ln(MM)). Then, each thread Pj is assigned an initial time period consisting of 
Ri frames (with total duration Ri ■ <l>), where R4 is chosen randomly, independently and uniformly, from 
the range [0, a — 1], where a = C/\n(MN). Each transaction has two priorities: low or high associated 
with them. Transaction Ty is initially in low priority. Transaction Tij switches to high priority (or normal 
priority) in the first time step of frame = Ri + (j — 1) and remains in high priority thereafter until 
it commits. The priorities are used to resolve conflicts. A high priority transaction may only be aborted 
by another high priority transaction. A low priority transaction is always aborted if it conflicts with a high 
priority transaction. 

Let Gt denote the conflict graph of transactions at time t where each transaction corresponds to a node 
and two transactions are connected with an edge if they conflict in at least one shared resource. Note that 
the maximum degree of Gt is bounded by C for the transactions in window W. At each time step t we 
select to commit a maximal independent set of transactions in Gt- We first select a maximal independent set 
Ih of high priority transactions then remove this set and its neighbors from Gt, and then select a maximal 
independent set II of low priority transactions from the remaining conflict graph. The transactions that 
commit are Ijj U II- 

The intuition behind the algorithm is as follows: Consider a thread i and its first transaction in the win- 
dow Tii. According to the algorithm, Tn becomes high priority in the beginning of frame Fi\. Because 
Ri is chosen at random among aC / ln(MN) positions it is expected that Tn will conflict with at most 
0(\n(MN)) transactions which become simultaneously high priority in the same time frame (in Fij). Since 
the duration of a time frame is <E> = 0(ln(MiV)), transaction Tn and all its high priority conflicting trans- 
actions will be able to commit by the end of time frame Yi, using the conflict resolution graph. The initial 
randomization period of R t ■ $ frames will have the same effect to the remaining transactions of the thread 
i, which will also commit within their chosen high priority frames. 
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Algorithm 1: Offline-Greedy 



Input: AM X N window W of transactions with M threads each with N transactions, where C is the 

maximum number of transactions that a transaction can conflict within the window; 
Output: A greedy execution schedule for the window of transactions W; 

Divide time into time frames of duration $ = 1 + (e 2 + 2) ln(MiV); 
Each thread Pj chooses a random number Rj G [0, a — 1] for a = C j ln(MN); 
foreach time step t = 0,1,2,. ..do 
Phase 1: Priority Assignment; 
foreach transaction do 
F tj «- Ri + (j - l); 
if t < F i:i • $ then 

Priority (Tij) 4— Low; 

else 

Priority (Tij) 4— High; 

Phase 2: Conflict Resolution; 
begin 

Let G t be the conflict graph at time t; 

Compute and Gf, the subgraphs of Gt induced by high and low priority nodes, respectively; 
Compute In ^- I{Gf)> maximal independent set of nodes in graph G^ 1 ; 
Q low priority nodes adjacent to nodes in Ih ; 

Compute II = I(Gf — Q), maximal independent set of nodes in graph Gf after removing Q nodes; 
Commit Ih U II', 



4.1 Analysis of Offline Algorithm 

We study two classic efficiency measures for the analysis of our contention management algorithm: (a) 
the makespan, which gives the total time to complete all the MN transactions in the window; and (b) the 
response time of the system, which gives how much time a transaction takes to commit. 

According to the algorithm, when a transaction enters into the system, it will be in low priority until 
Fij starts. As soon as Fij starts, it will enter into its respective frame and begin executing in high priority. 
Let A denote the set of conflicting transactions with Tij. Let A' C A denote the subset of conflicting 
transactions of which become high priority during frame (simultaneously with Tij). 

Lemma 4.1 If \A'\ < $ — 1 then transaction Tj will commit in frame F^. 

Proof. Due to the use of the high priority independent sets in the conflict graph Gt, if in time t during frame 
F^ transaction Tj does not commit, then some conflicting transaction in A' must commit. Since there are 
at most $ — 1 high priority conflicting transactions, and the length of the frame Fij is at most <£, Tj will 
commit by the end of frame Fjj. □ 

We show next that it is unlikely that \A'\ > $ — 1. We will use the following version of the Chernoff 
bound: 

Lemma 4.2 (Chernoff bound 1) Let X\ , X2, ■ ■ ■ , X n be independent Poisson trials such that, for 1 < i < 
n, ~Pr(Xi = 1) = pri, where < pri < 1. Then, for X = Y27=l -^-i, f- = E[X] = ^li=\Wi> an d any 
5 > e 2 , Pr(X > 5p) < e"^. 

Lemma 4.3 \A'\ > $ - 1 with probability at most (l/MN) 2 . 
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Proof. Let A k C A, where 1 < k < M, denote the set of transactions of thread P k that conflict with 
transaction Tq. We partition the threads Pi, ... , Pm into 3 classes Qq, Q\, and Q3, such that: 

• Qq contains every thread P k which either \Ak\ = 0, or \A k \ > but the positions of the transactions 
in A k are such that it is impossible to overlap with Fy for any random intervals R4 and R k . 

• Qi contains every thread P& with < \A k \ < a, and at least one of the transactions in A k is positioned 
so that it is possible to overlap with with frame for some choices of the random intervals Ri and 

• Q2 contains every thread Pk with a < \A k \. Note that IQ2I < C/a = ln(NM). 

Let Y k be a random binary variable, such that = 1 if in thread P k any of the transactions in A k 
becomes high priority in Fij (same frame with Tij), and Y k = otherwise. Let Y = J2t=i Yk- Note that 
\A'\ = Y. Denote pr k = Pr(Y k = 1). We can write Y = Z + Z\ + Z 2 , where Z t = J2 PkeQe Y k , where 
< t < 2. Clearly, Z = 0. and Z 2 < \Q 2 \ < In(MJV). 

Recall that for each thread there is a random initial interval with R k frames, where R k is chosen 
uniformly at random in [0, a — 1]. Therefore, for each P k € Qi, < pr k < |^4fe|/a < 1, since there are 
|.Ajfc| < a conflicting transactions in A.- t and there are at least a random choices for the relative position of 
transaction T^. Consequently, 

Li = nZi}= Vpr fc < V iM = I. V |A fe |<-<ln(MiV). 
— ' a a a 

P k eZ! p k eZ! p k eZi 

By applying the Chernoff bound of Lemma l4~2l we obtain that 

Pr(Zi > (e 2 + 1) M ) < e-^+iV < e -2in(A/7V) = (MiV) -2_ 

Since y = Z + Zi + Z 2 , and Z 2 < ln(M JV), we obtain Pr(|A'| = y > (e 2 + 2)/i = $ - 1) < (MN)~ 2 , 
as needed. □ 

Theorem 4.4 (makespan of Offline-Greedy) Algorithm Offline-Greedy produces a schedule of length 
0{C + Nlog(MN)) with probability at least 1 — -tto. 

Proof. From Lemmas |4~T1 and |431 the frame length <3? does not suffice to commit transaction Tij within frame 
F^ (bad event) with probability at most NM~ 2 . Considering all the MN transactions in the window a bad 
event occurs with probability at most MN ■ MN~ 2 = MN~ 1 . Thus, with probability at least 1 — MN^ 1 
all transactions will commit with the frames that they become high priority. The total time used by any 
thread is bounded by (a + N) ■ $ = 0(C + Nlog(MN)). □ 

Since N is a lower bound for the makespan, Theorem [44] implies the following competitive ratio for the 
M x N window W: 

Corollary 1 (competitive ratio of Offline-Greedy) When C < N ■ ln(MN), CP(Offline-Greedy, W) = 

0(\og(NM)), with high probability. 

The following corollary follows immediately from Lemmas [4. 1 1 and l43l 

Corollary 2 (response time of Offline-Greedy) The time that a transaction needs to commit from the 
moment it starts is 0(C + j ■ log(MN)) with probability at least 1 — pj^fp-- 
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5 Online Algorithm 



We present Algorithm Online-Greedy (Algorithm 13, which is online in the sense that it does not depend 
on knowing the dependency graph to resolve conflicts. This algorithm is similar to Algorithm Q] with the 
difference that in the conflict resolution phase we use as a subroutine a variation of Algorithm Random- 
izedRounds proposed by Schneider and Wattenhofer [?]. The makespan of the online algorithm is slightly 
worse than the offline algorithm, since the duration of the phase is now $' = O (log 2 (MAO). 

There are two different priorities associated with each transaction under this algorithm. The pair of 
priorities for a transaction is given as a vector (tt^, tt^), where ir^ represents the Boolean priority value 
low or high (with respective values 1 and 0) as described in Algorithm [TJ and 7rW € [1, M] represents the 
random priorities used in Algorithm RandomizedRounds. The conflicts are resolved in lexicographical 
order based on the priority vectors, so that vectors with lower lexicographic order have higher priority. 

When a transaction T enters the system, it starts to execute immediately in low priority (tt^> = 1) 
until the respective randomly chosen time frame F starts where it switches to high priority (ir^ = 0). 
Once in high priority, the field tt^ will be used to resolve conflicts with other high priority transactions. A 
transaction chooses a discrete number 7P 1 ) uniformly at random in the interval [1, M] on start of the frame 
Fij, and after every abort. In case of a conflict with another high priority transaction K but which has 
higher random number (vr^ 1 )) than T, then T proceeds and K aborts. The procedure Abort(T, K) aborts 
transaction K and K must hold off on restarting (i.e. hold off attempting to commit) until T has been 
committed or aborted. 

5.1 Analysis of Online Algorithm 

In the analysis given below, we study the makespan and the response time of Algorithm Online-Greedy. 
The analysis is based on the following adaptation of the response time analysis of a one-shot transaction 
problem with Algorithm RandomizedRounds [?]. It uses the following Chernoff bound: 

Lemma 5.1 (Chernoff bound 2) Let X\,X2, ■ ■ ■ , X n be independent Poisson trials such that, for 1 < i < 
n, Pr(Aj = 1) = pn, where < pri < 1. Then, for X = Y27=l ^i, ^ = = 'Y^i=\'P r i' an d an y 

< S < 1, Pr(X < (1 - 5) (J.) < e~ p ^ 2 . 

Lemma 5.2 (Adaptation from Schneider and Wattenhofer [1]) Given a one-shot transaction scheduling 
problem with M transactions, the time span a transaction T needs from its first start until commit is 

lQe(dx + l)logn with probability at least 1 \, where dx is the number of transactions conflicting 

with T. 

Proof. Consider the conflict graph G. Let Nt denote the set of conflicting transactions for T (these are the 
neighbors of T in G). We have dr = \Nt\ < m. Let yx denote the random priority number choice of T in 
range [1, M]. The probability that for transaction T no transaction K G Nt has the same random number 
is: 




The probability that is at least as small as yx for any transaction K £ Nt is rf +1 ■ Thus, the chance that 
yT is smallest and different among all its neighbors in Nt is at least e ^ +1 ) ■ If we conduct 16e(dr + 1) In n 
trials, each having success probability , , 1 ^ , then the probability that the number of successes Z is less 



than 8 In n becomes: Pr(Z < 8 • In n) < e 21nn = 4r, using the Chernoff bound of Lemma 15711 □ 
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Algorithm 2: Online-Greedy 

Input: AM X N window W of transactions with M threads each with N transactions, where C is the 

maximum number of transactions that a transaction can conflict within the window; 
Output: A greedy execution schedule for the window of transactions W; 

Divide time into time frames of duration $' = 16e$ \n(MN); 
Associate pair of priorities (wi?' , irQ ') to each transaction T^; 
Each thread Pi chooses a random number Ri G [0, a — 1] for a = C/\n(NM); 
foreach time step f = 0,l,2,...do 
Phase 1: Priority Assignment; 
foreach transaction Tj do 
Fa ^R t + (j - 1); 
if t < F %3 $' then 

Priority it^ <— 1 (Low); 

else 

Priority tt^ 2) <- (High); 

Phase 2: Conflict Resolution; 
begin 

(2) 

if 7r 2J - == (Tij has high priority) then 
On (re)start of transaction ; 
begin 



Theorem 5.3 (makesspan of Online-Greedy) Algorithm Online-Greedy produces a schedule of length 
0(Clog(MN) + Nlog 2 (MN)) with probability at least 1 - jfa. 

Proof. According to the algorithm, a transaction becomes high priority (ir^ = 0) in frame Fij. When 
this occurs the transaction will start to compete with other transactions which became high priority during 
the same frame. Lemma |4~T1 from the analysis of Algorithm [TJ implies that the effective degree of with 
respect to high priority transactions is dr > 3> — 1 with probability at most (MN)~ 2 (we call this bad event 
A). From Lemma [5721 if dr < $ — 1, the transaction will not commit within 16e(dr + 1) log n < <J?' time 
slots with probability at most (MN)~ 2 (we call this bad event B). Therefore, the bad event that does not 
commit in Fij occurs when either bad event A or bad event B occurs, which happens with probability at most 
(MN)~ 2 + (MN)~ 2 = 2(MN)~ 2 . Considering now all the MN transactions, the probability of failure 
is at most 2/MN. Thus, with probability at least 1 — 2/NM, every transaction commits during the F^ 
frame. The total duration of the schedule is bounded by (a + N)& = 0(C\og(MN) + N log 2 (MN)). □ 

Corollary 3 (competitive ratio of Online-Greedy) When C < N-ln(MN), (^(Online-Greedy, W) = 

0(\og 2 (NM)), with high probability. 




On conflict of transaction Tij with high priority transaction Tkf, 
begin 

ttn^Kn® then 
Abort(Tij,T kl ); 



else 



_ Abort(T u ,T tJ ); 
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Algorithm 3: Adaptive-Greedy 



Input: An M x N execution window W with M threads each with N transactions, where C is unknown; 
Output: A greedy execution schedule for the window of transactions; 

Associate triplet of priorities (tt^ , tt^ , tt^- 1 '} to each transaction when available for execution; 

Code for thread Pi, 

begin 

Initial contention estimate C, <— 1; 
repeat 

Online-Greedy^,, W); 

if bad event then 

L Q <- 2 ■ d ; 

until all transactions are committed; 



Corollary 4 (response time of Online-Greedy) The time that a transaction Tij needs to commit from the 
moment it starts is 0{C\og(MN) + j ■ log 2 (MN)) with probability at least 1 — jjjjjyi- 

6 Adaptive Algorithm 

A limitation of Algorithms Q] and [2] is that C needs to be known ahead for each window W that the algo- 
rithms are applied to. We show here that it is possible to guess the value C in a window W. We present the 
Algorithm Adaptive-Greedy (Algorithm [3]) which can guess the value of C. From the analysis of Algo- 
rithms [Hand 12 we know that the knowledge of the value C plays vital role in the probability of success of 
the algorithms. 

In Adaptive-Greedy each thread Pi attempts to guess individually the right value of C. The algorithm 
works based on the exponential back-off strategy used by many contention managers developed in the lit- 
erature such as Polka. The algorithm works as follows: each thread starts with assuming C = 1. Based 
on the current estimate C then the thread attempts to execute Algorithm |2l for each of its transactions as- 
suming the window size M x N. Now, if the choice of C is correct then each transactions of the thread in 
the window W of the thread Pi should commit within the designated frame that it becomes high priority. 
Thus, all transactions of the frame should commit within the makespan time estimate Algorithm [2] which 
is tc = 0(Clog(MN) + log 2 (MAQ). However, if during tc some thread does not commit within 
its designated frame (bad event), then thread Pj will assume that the choice of C was incorrect, and will 
start over again with the remaining transactions assuming C = 2C', where C is the previous estimate for 
C. Eventually thread Pi will guess the correct value of C for the window W, and all its transactions will 
commit within the respective time. 

The different threads adapt independently from each other to the correct value of C. At the same moment 
of time the various threads may have assumed different values of C. The threads with higher estimate of 
C will be given higher priority in conflicts, since threads with lower C most likely have guessed the wrong 
C and are still adapting. In order to handle conflicts each transaction uses a vector of priorities with three 
values (tt^ 3 \tt^ ,tt^). The value of priority entry tt 3 is inversely proportional to the current guess of C 
for the thread, so that higher value of C implies higher priority. The last two entries ir^ and are the 
same as in Algorithm |2] It is easy to that the correct choice of C will be reached by a thread Pi within log C 
iterations. The total makespan and response time is asymptotically the same as with Algorithm [2] 
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7 Optimal Window Decomposition 



In this section we are interested in partitioning a M x N window W into some decomposition of sub- 
windows such that if we schedule the transactions of each sub-window separately using one of our greedy 
contention managers then the sum of the makespans of the sub-windows is better than scheduling all the 
transactions of W as a single window. In particular we are seeking a decomposition that minimizes the 
maximum density of the sub-windows, where the density expresses how much larger is the contention with 
respect to the number of transactions per thread. 

For window W with congestion C we define the density as r = C/N. Consider some decomposition 
D of window W into different sub-windows D = {W\, • • • , Wk}, where sub-window Wi has respective 
size M x Let Cj denote the contention of window u>j. The density of Wi is rj = Cj/Xj. Let td = 
maxWieD r i- The optimal window decomposition D* has density rp* = minDe.T> r D, where T> denotes the 
set all possible decompositions of W . Note that different decompositions in V may have different number 
of windows. Two example decompositions members of V is one that consists only of W, and another that 
consists of all single column windows of W. 

The optimal window decomposition D* can provide asymptotically better makespan for W if rp* = 
o(r). Using one of our greedy algorithms, the makespan of each sub-window Wi £ D* is 0((1 + rn*)Xi) 
(where the notation O hides polylog factors). Thus, using D* , the makespan for the whole window W 
becomes 0((1 + ro») YlweD* -^») = + ru*)N). If we apply one of our greedy algorithms in the 
whole window W directly, then the makespan for W is 0((1 + r)N), which may be asymptotically worse 
than using the optimal decomposition D* when r^* = o(r). 
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Figure 2: Optimal window decomposition 



We use a dynamic programming approach to compute the optimal decomposition D* of W. The idea 
is compute the optimal decomposition of all prefix windows of W. As shown in Figure |2j our goal is to 
determine the optimal window decomposition including the prefix window up to column k provided that 
optimal window decomposition till column k — 1 has been already computed. In this case, there are k 
possible combinations to examine for finding the optimal window size which will minimize the maximum 
of all the contention densities. The details are in the proof of the following theorem. 

Theorem 7.1 (optimal window decomposition) The optimal window decomposition D* for an arbitrary 
M x N window W can be computed in polynomial time. 

Proof. From the problem description, we can readily see the overlapping-subproblems property in the 
optimal window decomposition problem. Let r, ^ denote the density in the decomposition of the sub- 
window Wj t k, which starts at column j and ends at column k, where j < k. Let r* fc denote the maximum 
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density in the optimal decomposition of the sub-window Wjk- The optimal window decomposition in this 
scenario can be determined from this recursive formula: 




min {max(r* (r i)fc ))}. 
:j<k-i ,J 



To find the optimal window decomposition for the fc-th prefix window Wi j, we have to check for all the 
combinations from first to k — 1 prefix window and the suffix up to k. Using the formula we can compute 
r* k for each prefix W\^. Our algorithm needs 0(k) time to compute optimal window size for the k-th 
prefix provided that the optimal window computation till the (k — l)-th prefix is known. To compute then 
all the values for each window combination from 1 to k, our algorithm recursively takes 0(k 2 ) steps. The 



8 Conclusions 

In this paper, we consider greedy contention managers for transactional memory for M x N windows of 
transactions with M threads and N transactions per thread and present three new algorithms for contention 
management in transactional memory from a worst-case perspective. These algorithms are efficient, adap- 
tive, and handle windows of transactions and improve on the worst-case performance of previous results. 
These are the first such results for the execution of sequences of transactions instead of the one-shot problem 
which present new trade-offs in the analysis of greedy contention managers for transactional memory. We 
also show that the optimal window decomposition can be determined using dynamic programming for any 
arbitrary window. With this work, we left with some issues for the future work. One may consider arbitrary 
time durations for the transactions to execute instead of the 0(1) time we considered in our analysis. The 
other aspects may be to explore in deep the alternative algorithms where the randomization does not occur at 
the beginning of each window but rather during the executions of the algorithm by inserting random periods 
of low priority between the transactions in each thread. One may also consider the dynamic expansion and 
contraction of the execution window to preserve the congestion measure C. Thus, the execution window 
will not be a part of the algorithm but only a part of the analysis. This will result to more practical algorithms 
which at the same time achieve good performance guarantees. 



final density is m* 
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